{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xfcSMZEIFA_l"
   },
   "source": [
    "# ML with TensorFlow Extended (TFX) -- Part 2\n",
    "The puprpose of this tutorial is to show how to do end-to-end ML with TFX libraries on Google Cloud Platform. This tutorial covers:\n",
    "1. Data analysis and schema generation with **TF Data Validation**.\n",
    "2. Data preprocessing with **TF Transform**.\n",
    "3. Model training with **TF Estimator**.\n",
    "4. Model evaluation with **TF Model Analysis**.\n",
    "\n",
    "This notebook has been tested in Jupyter on the Deep Learning VM.\n",
    "\n",
    "## 0. Setup Python and Cloud environment\n",
    "\n",
    "Apache Beam support for Python 3 is in alpha at the moment, so we'll do this notebook in Python 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m pip install --upgrade grpcio_tools tensorflow_data_validation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%%bash\n",
    "# install from source to get latest bug fixes in.\n",
    "#git clone https://github.com/apache/beam\n",
    "#cd beam/sdks/python\n",
    "#python3 setup.py sdist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#%pip install -q --upgrade './beam/sdks/python/dist/apache-beam-2.13.0.dev0.tar.gz[gcp]'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tornado version: 5.1.1\n",
      "Python version: 2.7.13\n",
      "TF version: 1.13.1\n",
      "TFT version: 0.13.0\n",
      "TFDV version: 0.13.1\n",
      "Apache Beam version: 2.11.0\n"
     ]
    }
   ],
   "source": [
    "import apache_beam as beam\n",
    "import platform\n",
    "import tensorflow as tf\n",
    "import tensorflow_data_validation as tfdv\n",
    "import tensorflow_transform as tft\n",
    "import tornado\n",
    "\n",
    "print('tornado version: {}'.format(tornado.version))\n",
    "print('Python version: {}'.format(platform.python_version()))\n",
    "print('TF version: {}'.format(tf.__version__))\n",
    "print('TFT version: {}'.format(tft.__version__))\n",
    "print('TFDV version: {}'.format(tfdv.__version__))\n",
    "print('Apache Beam version: {}'.format(beam.__version__))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = 'cloud-training-demos'    # Replace with your PROJECT\n",
    "BUCKET = 'cloud-training-demos-ml'  # Replace with your BUCKET\n",
    "REGION = 'us-central1'              # Choose an available region for Cloud MLE\n",
    "\n",
    "import os\n",
    "\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['REGION'] = REGION\n",
    "\n",
    "## ensure we're using python2 env\n",
    "os.environ['CLOUDSDK_PYTHON'] = 'python2'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Updated property [core/project].\n",
      "Updated property [compute/region].\n",
      "Updated property [ml_engine/local_python].\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION\n",
    "\n",
    "## ensure we predict locally with our current Python environment\n",
    "gcloud config set ml_engine/local_python `which python`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img valign=\"middle\" src=\"images/tfx.jpeg\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "l9u699PmHJXU"
   },
   "source": [
    "### Flights dataset\n",
    "\n",
    "We'll use the flights dataset from the book [Data Science on Google Cloud Platform](http://shop.oreilly.com/product/0636920057628.do)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 51
    },
    "colab_type": "code",
    "id": "ksuSTsysHfZV",
    "outputId": "87adfbf0-be77-4d81-9162-5a2f9feffd90"
   },
   "outputs": [],
   "source": [
    "DATA_BUCKET = \"gs://cloud-training-demos/flights/chapter8/output/\"\n",
    "TRAIN_DATA_PATTERN = DATA_BUCKET + \"train*\"\n",
    "EVAL_DATA_PATTERN = DATA_BUCKET + \"test*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "CSV_COLUMNS = ('ontime,dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + \n",
    "               ',carrier,dep_lat,dep_lon,arr_lat,arr_lon,origin,dest').split(',')\n",
    "TARGET_FEATURE_NAME = 'ontime'\n",
    "DEFAULTS     = [[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],\\\n",
    "                ['na'],[0.0],[0.0],[0.0],[0.0],['na'],['na']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kzEuipT4G0JE"
   },
   "source": [
    "## 2. Data Preprocessing\n",
    "For data preprocessing and transformation, we use [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) to perform the following:\n",
    "1. Implement transformation logic in **preprocess_fn**\n",
    "2. **Analyze and transform** training data.\n",
    "4. **Transform** evaluation data.\n",
    "5. Save transformed **data**, transform **schema**, and transform **logic**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "uzEWJDIbRcGQ"
   },
   "source": [
    "### 2.1 Implement preprocess_fn "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "I6svvdw3RhG2"
   },
   "outputs": [],
   "source": [
    "def make_preprocessing_fn(raw_schema):\n",
    "\n",
    "  def preprocessing_fn(input_features):\n",
    "\n",
    "    processed_features = {}\n",
    "\n",
    "    for feature in raw_schema.feature:\n",
    "      feature_name = feature.name\n",
    "      \n",
    "      if feature_name in [TARGET_FEATURE_NAME]:\n",
    "        processed_features[feature_name] = input_features[feature_name]\n",
    "      elif feature.type == 1:\n",
    "        # Extract vocabulary and integerize categorical features.\n",
    "        processed_features[feature_name+\"_integerized\"] = (\n",
    "            tft.compute_and_apply_vocabulary(input_features[feature_name], vocab_filename=feature_name))\n",
    "      else:\n",
    "        # normalize numeric features.\n",
    "        processed_features[feature_name+\"_scaled\"] = tft.scale_to_z_score(input_features[feature_name])\n",
    "\n",
    "    # Bucketize some of the numeric features using quantiles.\n",
    "    quantiles = tft.quantiles(input_features[\"distance\"], num_buckets=5, epsilon=0.01)\n",
    "    processed_features[\"distance_bucketized\"] = tft.apply_buckets(\n",
    "      input_features[\"distance\"], bucket_boundaries=quantiles)\n",
    "\n",
    "    return processed_features\n",
    "\n",
    "  return preprocessing_fn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4rMiQew8Z8V7"
   },
   "source": [
    "### 2.2 Implement the Beam pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YSjlcdLNZ7FX"
   },
   "outputs": [],
   "source": [
    "def run_pipeline(args):\n",
    "  import tensorflow_transform as tft\n",
    "  import tensorflow_transform.beam as tft_beam\n",
    "  import tensorflow_data_validation as tfdv\n",
    "  from tensorflow_transform.tf_metadata import dataset_metadata\n",
    "  from tensorflow_transform.tf_metadata import dataset_schema\n",
    "  from tensorflow_transform.tf_metadata import schema_utils\n",
    "    \n",
    "  pipeline_options = beam.pipeline.PipelineOptions(flags=[], **args)\n",
    "    \n",
    "  raw_schema_location = args['raw_schema_location']\n",
    "  raw_train_data_location = args['raw_train_data_location']\n",
    "  raw_eval_data_location = args['raw_eval_data_location']\n",
    "  transformed_train_data_location = args['transformed_train_data_location']\n",
    "  transformed_eval_data_location = args['transformed_eval_data_location']\n",
    "  transform_artifact_location = args['transform_artifact_location']\n",
    "  temporary_dir = args['temporary_dir']\n",
    "  runner = args['runner']\n",
    "    \n",
    "  print (\"Raw schema location: {}\".format(raw_schema_location))\n",
    "  print (\"Raw train data location: {}\".format(raw_train_data_location))\n",
    "  print (\"Raw evaluation data location: {}\".format(raw_eval_data_location))\n",
    "  print (\"Transformed train data location: {}\".format(transformed_train_data_location))\n",
    "  print (\"Transformed evaluation data location: {}\".format(transformed_eval_data_location))\n",
    "  print (\"Transform artifact location: {}\".format(transform_artifact_location))\n",
    "  print (\"Temporary directory: {}\".format(temporary_dir))\n",
    "  print (\"Runner: {}\".format(runner))\n",
    "  print (\"\")\n",
    "\n",
    "  # Load TFDV schema and create tft schema from it.\n",
    "  source_raw_schema = tfdv.load_schema_text(raw_schema_location)\n",
    "  raw_feature_spec = schema_utils.schema_as_feature_spec(source_raw_schema).feature_spec\n",
    "  raw_metadata = dataset_metadata.DatasetMetadata(\n",
    "    dataset_schema.from_feature_spec(raw_feature_spec))\n",
    "\n",
    "  with beam.Pipeline(runner, options=pipeline_options) as pipeline:\n",
    "    with tft_beam.Context(temporary_dir):\n",
    "      \n",
    "      converter = tft.coders.CsvCoder(column_names=CSV_COLUMNS, \n",
    "        schema=raw_metadata.schema)\n",
    "\n",
    "      ###### analyze & transform trainining data ###############################\n",
    "\n",
    "      # Read raw training csv data.\n",
    "      step = 'Train'\n",
    "      print (\"Reading and parsing raw training data...\")\n",
    "      raw_train_data = (\n",
    "        pipeline\n",
    "          | '{} - Read Raw Data'.format(step) >> beam.io.textio.ReadFromText(raw_train_data_location)\n",
    "          | '{} - Remove Empty Rows'.format(step) >> beam.Filter(lambda line: line)\n",
    "          | '{} - Decode CSV Data'.format(step) >> beam.Map(converter.decode)\n",
    "        )\n",
    "      \n",
    "      # Create a train dataset from the data and schema.\n",
    "      raw_train_dataset = (raw_train_data, raw_metadata)\n",
    "\n",
    "      # Analyze and transform raw_train_dataset to produced transformed_train_dataset and transform_fn.\n",
    "      print (\"Analyzing and transforming raw training data...\")\n",
    "      transformed_train_dataset, transform_fn = (\n",
    "        raw_train_dataset \n",
    "        | '{} - Analyze & Transform'.format(step) >> tft_beam.AnalyzeAndTransformDataset(\n",
    "              make_preprocessing_fn(source_raw_schema))\n",
    "      )\n",
    "  \n",
    "      # Get data and schema separately from the transformed_train_dataset.\n",
    "      transformed_train_data, transformed_metadata = transformed_train_dataset\n",
    "\n",
    "      # write transformed train data to sink.\n",
    "      print (\"Writing transformed training data...\")\n",
    "      _ = (\n",
    "        transformed_train_data \n",
    "          | '{} - Write Transformed Data'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n",
    "            file_path_prefix=transformed_train_data_location,\n",
    "            file_name_suffix=\".tfrecords\",\n",
    "            coder=tft.coders.ExampleProtoCoder(transformed_metadata.schema))\n",
    "        )\n",
    "\n",
    "      ###### transform evaluation data #########################################\n",
    "\n",
    "      # Read raw training csv data.\n",
    "      step = 'Eval'\n",
    "      print (\"Reading and parsing raw evaluation data...\")\n",
    "      raw_eval_data = (\n",
    "        pipeline\n",
    "          | '{} - Read Raw Data'.format(step) >> beam.io.textio.ReadFromText(raw_eval_data_location)\n",
    "          | '{} - Remove Empty Rows'.format(step) >> beam.Filter(lambda line: line)\n",
    "          | '{} - Decode CSV Data'.format(step) >> beam.Map(converter.decode)\n",
    "        )\n",
    "      \n",
    "      # Create a eval dataset from the data and schema.\n",
    "      raw_eval_dataset = (raw_eval_data, raw_metadata)\n",
    "\n",
    "      # Transform eval data based on produced transform_fn.\n",
    "      print (\"Transforming raw evaluation data...\")\n",
    "      transformed_eval_dataset = (\n",
    "        (raw_eval_dataset, transform_fn) \n",
    "          | '{} - Transform'.format(step) >> tft_beam.TransformDataset()\n",
    "      )\n",
    "\n",
    "      # Get data from the transformed_eval_dataset.\n",
    "      transformed_eval_data, _ = transformed_eval_dataset\n",
    "\n",
    "      # Write transformed eval data to sink.\n",
    "      print (\"Writing transformed evaluation data...\")\n",
    "      _ = (\n",
    "          transformed_eval_data \n",
    "          | '{} - Write Transformed Data'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n",
    "              file_path_prefix=transformed_eval_data_location,\n",
    "              file_name_suffix=\".tfrecords\",\n",
    "              coder=tft.coders.ExampleProtoCoder(transformed_metadata.schema))\n",
    "        )\n",
    "\n",
    "      ###### write transformation metadata #######################################################\n",
    "\n",
    "      # Write transform_fn.\n",
    "      print (\"Writing transform artifacts...\")\n",
    "      _ = (\n",
    "          transform_fn \n",
    "          | 'Write Transform Artifacts' >> tft_beam.WriteTransformFn(\n",
    "              transform_artifact_location)\n",
    "      )\n",
    "\n",
    "          "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2QQzYPXGhqwP"
   },
   "source": [
    "### 1.4 Run data tranformation pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m pip freeze | grep tensorflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting setup.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile setup.py\n",
    "from setuptools import setup, find_packages\n",
    "\n",
    "setup(name='tfxdemo',\n",
    "      version='1.0',\n",
    "      packages=find_packages(),\n",
    "      install_requires=['tensorflow-transform==0.13.0', \n",
    "                        'tensorflow-data-validation==0.13.1'],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Snd1a6_yhq9H"
   },
   "outputs": [],
   "source": [
    "#runner = 'DirectRunner'; OUTPUT_DIR = 'output/flights/tfx'   # on-prem\n",
    "#runner = 'DirectRunner'; OUTPUT_DIR = 'gs://{}/flights/tfx'.format(BUCKET)  # hybrid\n",
    "runner = 'DataflowRunner'; OUTPUT_DIR = 'gs://{}/flights/tfx'.format(BUCKET)  # on GCP\n",
    "\n",
    "RAW_SCHEMA_LOCATION = 'raw_schema.pbtxt'\n",
    "TRANSFORM_ARTIFACTS_DIR = os.path.join(OUTPUT_DIR,'transform')\n",
    "TRANSFORMED_DATA_DIR = os.path.join(OUTPUT_DIR,'transformed')\n",
    "TEMP_DIR = os.path.join(OUTPUT_DIR, 'tmp')\n",
    "\n",
    "args = {\n",
    "    \n",
    "    'runner': runner,\n",
    "\n",
    "    'raw_schema_location': RAW_SCHEMA_LOCATION,\n",
    "\n",
    "    'raw_train_data_location': TRAIN_DATA_PATTERN,\n",
    "    'raw_eval_data_location': EVAL_DATA_PATTERN,\n",
    "\n",
    "    'transformed_train_data_location':  os.path.join(TRANSFORMED_DATA_DIR, \"train\"),\n",
    "    'transformed_eval_data_location':  os.path.join(TRANSFORMED_DATA_DIR, \"eval\"),\n",
    "    'transform_artifact_location':  TRANSFORM_ARTIFACTS_DIR,\n",
    "    \n",
    "    'temporary_dir': TEMP_DIR,\n",
    "    'project': PROJECT,\n",
    "    'temp_location': TEMP_DIR,\n",
    "    'staging_location': os.path.join(OUTPUT_DIR, 'staging'),\n",
    "    'max_num_workers': 8,\n",
    "    'save_main_session': False,\n",
    "    'setup_file': './setup.py'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 374
    },
    "colab_type": "code",
    "id": "qNyztdP3jjaz",
    "outputId": "2c75be8e-cfe5-4834-d0ae-2f07bfecf373"
   },
   "outputs": [],
   "source": [
    "if tf.gfile.Exists(OUTPUT_DIR):\n",
    "  print(\"Removing {} contents...\".format(OUTPUT_DIR))\n",
    "  tf.gfile.DeleteRecursively(OUTPUT_DIR)\n",
    "\n",
    "tf.logging.set_verbosity(tf.logging.ERROR)\n",
    "print(\"Running TF Transform pipeline...\")\n",
    "print()\n",
    "run_pipeline(args)\n",
    "print()\n",
    "print(\"Pipeline is done.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PfQO1IjmsZQb"
   },
   "source": [
    "### Check the outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 153
    },
    "colab_type": "code",
    "id": "SmBSS38GsZbx",
    "outputId": "8876bb26-57ab-40dd-aac9-35210344e09b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gs://cloud-training-demos-ml/flights/tfx/\n",
      "\n",
      "gs://cloud-training-demos-ml/flights/tfx/staging/:\n",
      "gs://cloud-training-demos-ml/flights/tfx/staging/beamapp-jupyter-0402043210-224816.1554179530.224958/\n",
      "\n",
      "gs://cloud-training-demos-ml/flights/tfx/tmp/:\n",
      "gs://cloud-training-demos-ml/flights/tfx/tmp/\n",
      "gs://cloud-training-demos-ml/flights/tfx/tmp/beamapp-jupyter-0402043210-224816.1554179530.224958/\n",
      "gs://cloud-training-demos-ml/flights/tfx/tmp/tftransform_tmp/\n",
      "\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/:\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/transformed_metadata/\n",
      "\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/:\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00000-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00001-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00002-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00003-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00004-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00005-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00006-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/eval-00007-of-00008.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00000-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00001-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00002-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00003-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00004-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00005-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00006-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00007-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00008-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00009-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00010-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00011-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00012-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00013-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00014-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00015-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00016-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00017-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00018-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00019-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00020-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00021-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00022-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00023-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00024-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00025-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00026-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00027-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00028-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00029-of-00031.tfrecords\n",
      "gs://cloud-training-demos-ml/flights/tfx/transformed/train-00030-of-00031.tfrecords\n"
     ]
    }
   ],
   "source": [
    "!gcloud storage ls $OUTPUT_DIR/*"   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 136
    },
    "colab_type": "code",
    "id": "sCexlD9EtV2I",
    "outputId": "5713ef76-ccc6-4578-9b8f-72df71ff62a8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/saved_model.pb\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/assets/\n",
      "gs://cloud-training-demos-ml/flights/tfx/transform/transform_fn/variables/\n"
     ]
    }
   ],
   "source": [
    "!gcloud storage ls $OUTPUT_DIR/transform/transform_fn"   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lYsjUQt2tbEV"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "feature {\n",
      "  name: \"arr_lat_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"arr_lon_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"avg_arr_delay_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"avg_dep_delay_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"carrier_integerized\"\n",
      "  type: INT\n",
      "  int_domain {\n",
      "    min: -1\n",
      "    max: 13\n",
      "    is_categorical: true\n",
      "  }\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"dep_delay_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"dep_lat_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"dep_lon_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"dest_integerized\"\n",
      "  type: INT\n",
      "  int_domain {\n",
      "    min: -1\n",
      "    max: 321\n",
      "    is_categorical: true\n",
      "  }\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"distance_bucketized\"\n",
      "  type: INT\n",
      "  int_domain {\n",
      "    min: 0\n",
      "    max: 4\n",
      "    is_categorical: true\n",
      "  }\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"distance_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"ontime\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"origin_integerized\"\n",
      "  type: INT\n",
      "  int_domain {\n",
      "    min: -1\n",
      "    max: 321\n",
      "    is_categorical: true\n",
      "  }\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n",
      "feature {\n",
      "  name: \"taxiout_scaled\"\n",
      "  type: FLOAT\n",
      "  presence {\n",
      "    min_fraction: 1.0\n",
      "  }\n",
      "  shape {\n",
      "    dim {\n",
      "      size: 1\n",
      "    }\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "!gcloud storage cat $OUTPUT_DIR/transform/transformed_metadata/schema.pbtxt"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Gsi_Hsh89Cl7"
   },
   "source": [
    "## License"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0fOWx1yI9Dyn"
   },
   "source": [
    "Copyright 2019 Google LLC\n",
    "\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "you may not use this file except in compliance with the License.\n",
    "You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software\n",
    "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "See the License for the specific language governing permissions and\n",
    "limitations under the License.\n",
    "\n",
    "---\n",
    "This is not an official Google product. The sample code provided for educational purposes only.\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "02-tfx_end_to_end",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
